Topic Classification for Short Texts
نویسندگان
چکیده
In the context of TV and social media surveillance, constructing models to automate topic identification short texts is a key task. This paper constructs worth-to-consider for practical usage, employing Top-K multinomial classification methodology. We describe full data processing pipeline, discussing about dataset selection, text preprocessing, feature extraction, model selection learning, including hyperparameter optimization. will test compare popular methods including: standard machine deep fine-tuned BERT classification.
منابع مشابه
Topic Segmentation for Short Texts
Topic segmentation, which aims to fmd the boundaries between topic blocks in a text, is an important task for semantic analysis of texts. Although different solutions have been proposed for the task, many limitations and difficulties exist in the approaches. In particular most of the methods do not work well for such case as short texts, internet news and student's writings. In this paper, we f...
متن کاملBUAP: Polarity Classification of Short Texts
We report the results we obtained at the subtask B (Message Polarity Classification) of SemEval 2014 Task 9. The features used for representing the messages were basically trigrams of characters, trigrams of PoS and a number of words selected by means of a graph mining tool. Our approach performed slightly below the overall average, except when a corpus of tweets with sarcasm was evaluated, in ...
متن کاملClassification of Short Legal Lithuanian Texts
Statistical analysis of parliamentary roll call votes is an important topic in political science because it reveals ideological positions of members of parliament (MP) and factions. However, it depends on the issues debated and voted upon. Therefore, analysis of carefully selected sets of roll call votes provides a deeper knowledge about MPs. However, in order to classify roll call votes accord...
متن کاملMulti-value Classification of Very Short Texts
We introduce a new stacking-like approach for multi-value classification. We apply this classification scheme using Naive Bayes, Rocchio and kNN classifiers on the well-known Reuters dataset. We use part-of-speech tagging for stopword removal. We show that our setup performs almost as well as other approaches that use the full article text even though we only classify headlines. Finally, we app...
متن کاملTopic Modeling over Short Texts by Incorporating Word Embeddings
Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks, such as content charactering, user interest profiling, and emerging topic detecting. Existing methods such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) cannot solve this problem very well since only very limited word co-o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture notes in information systems and organisation
سال: 2023
ISSN: ['2195-4976', '2195-4968']
DOI: https://doi.org/10.1007/978-3-031-32418-5_12